Communications Psychology
○ Springer Science and Business Media LLC
Preprints posted in the last 30 days, ranked by how well they match Communications Psychology's content profile, based on 20 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Nolan, C. R.; Le Pelley, M. E.; Garner, K. G.
Show abstract
The benefits of routines for daily functioning are widely acknowledged, yet, despite their apparent importance, methods for quantifying routine maintenance and the causes of their disruption remain lacking. Here, we propose a novel means of defining and quantifying routines (transition entropy). Using the transition entropy, we show that routines can be robustly elicited on tasks that require searching through a grid of squares for a hidden target. Over two experiments (N=100 each), we show that use of routines--as quantified by transition entropy--is robustly perturbed by frequent switches between search grids, as locations specific to the currently irrelevant grid become competitive for selection. Using a normative model that tracks task dynamics, we show that disruption to routines can be attributed to reduced sensitivity to the odds of success for completing a task. This suggests that routine maintenance may be disrupted by over-sensitivity to a lack of reward early in routine performance, or increased expectations regarding the utility of pursuing other tasks.
Navarro, V. M.; Brugger, S.; Wolpe, N.; Harding, J.; Fletcher, P.; Teufel, C.
Show abstract
Predictive coding has influenced many conceptual accounts of delusions, the bizarre and distressing beliefs that accompany a range of neuropsychiatric conditions. However, these explanations remain incomplete and have rarely been tested directly using formal modelling. Here, we present a formal account of delusional beliefs based on hybrid predictive coding, which sheds light on the computational mechanisms underpinning the core features of delusions: thematic recurrence and imperviousness to contradictory evidence. In simulation experiments, we demonstrate that a combination of contextually inadequate initialisation of beliefs and excessive certainty (a hallmark of psychosis), triggers a reorganisation of the generative model relating observed events to hidden causes. This reorganisation enables the maintenance of delusional beliefs that are thematically stable, internally consistent with external inputs, and impervious to contradictory evidence, all without an increase in prediction error. Overall, our results suggest that delusions may arise not from faulty inference, as previously argued, but as an adaptive consequence of generative models learned under atypical conditions. These findings provide mechanistic insights into the computations underpinning delusions and have important implications for a novel therapeutic strategy in terms of re-training generative models.
Atzert, C.; Dechterenko, F.; Lukavsky, J.; Busch, N. A.
Show abstract
Some images are consistently remembered better than others, suggesting that memorability reflects intrinsic image properties. We tested whether within-category distinctiveness underlies this effect. Across three experiments (N = 477), participants categorized indoor scenes previously rated for subjective typicality and then completed recognition memory tests. Typical scenes were categorized faster and more accurately, but were remembered worse and showed a more liberal response bias than atypical scenes. These opposing effects were robust across categories. To link subjective typicality to visual representations, we quantified image distinctiveness using a convolutional neural network (CNN). Across layers, CNN-derived distinctiveness closely tracked human typicality judgments and predicted both categorization speed and memorability, with strongest effects in higher, semantic layers. Critically, the memory advantage for atypical scenes persisted even when most images were atypical, ruling out rarity within the experimental context. Together, the results show that intrinsic scene memorability reflects an images position within a category-specific representational space.
Milham, M.; Low, D.; Erkent, A.; Trabulsi, J.; Kass, M. C.; Vos de Wael, R.; Yenepalli, S.; Wang, Y.; Leyden, M.; Jordan, C.; Salum, G.; Alexander, L.; Schubiner, G.; Hendrix, L.; Koyama, M.; Mears, L.; McAdams, R.; White, C.; Merikangas, K.; Satterthwaite, T. D.; Franco, A.; Klein, A.; Koplewicz, H.; Leventhal, B.; Freund, M.; Kiar, G.
Show abstract
Digital mental health applications enable high-frequency behavioral monitoring and scalable interventions. Journaling provides a therapeutically grounded and intrinsically engaging activity for many users. AI-based text analysis enables privacy-preserving phenotyping of clinically relevant patterns in naturalistic writing, including emotional distress and behavioral risk (e.g., indicators of intent, planning, or preparatory actions for harm to self or others). We evaluated a mobile journaling platform in an 8-week randomized controlled trial (N = 507) of young adults with mild-to-moderate anxiety and depression symptoms. Journaling produced modest reductions in anxiety relative to controls at the 8-week endpoint and 1-month follow-up (d = 0.16-0.19). Effects were small and did not remain significant after correction for multiple comparisons; complementary Bayesian models nonetheless provided moderate-to-strong directional evidence (90-97%) supporting a modest anxiety reduction. In parallel, behavioral phenotyping analyses showed that high-risk journal entries were more common among younger users (OR = 0.77 per year of age, p = 0.007). Text-based risk signals and self-reported energy exhibited significant circadian variation (e.g., risk probability was highest during late-night and overnight hours). Within-person analyses demonstrated strong short-term persistence in mood and risk states, with calm/relaxed showing the highest persistence and anxious/agitated exhibiting the lowest persistence. High-risk journal entries clustered temporally and were preceded by sustained low valence and energy. Although affective volatility was associated with acute declines within the same affective dimension (pleasantness or energy), it was not associated with escalation to high-risk states. Key behavioral dynamics observed in the trial were replicated in an independent general population dataset (N = 16,630). Collectively, these findings demonstrate that privacy-preserving digital journaling can support scalable longitudinal behavioral phenotyping and real-time risk monitoring while providing modest clinical benefit for anxiety symptoms.
Engeser, M.; Babaei, N.; Kaiser, D.
Show abstract
Each individual person looks at natural scenes in their own unique way, resulting in a distinct perceptual experience of the world. However, little is known about why such differences in gaze emerge. Here, we test the hypothesis that idiosyncrasies in gaze behavior are predicted by inter-subject variations in internal models--expectations about how scenes typically look. In two experiments, we first characterized participants personal internal models by asking them to draw typical bathroom and kitchen scenes. Individual differences in these drawings were quantified using an objective deep learning pipeline and, in turn, related to individual differences in gaze behavior. In Experiment 1, where participants freely viewed a set of kitchen and bathroom photographs, inter-subject similarities in internal models did not predict inter-subject similarities in gaze. In Experiment 2, we encouraged strategic exploration through gaze-contingent viewing and a memory task. Here, inter-subject similarities in internal models predicted similarities in fixation frequency and the sequence in which different object categories were inspected. These findings suggest that the influence of internal models on visual exploration is stronger under increased sensory uncertainty and when expectation-guided sampling of the environment is encouraged. Together, our results provide new insights into how individual expectations shape gaze behavior and help explain why people differ in how they explore the visual world.
Pham, T. Q.; Chikazoe, J.
Show abstract
Aesthetic preference is a primary driver of social behavior in the digital era, yet the extent to which these preferences remain consistent across disparate domains remains poorly understood. We hypothesize that aesthetic judgment is governed by a domain-invariant latent structure, such that individuals who exhibit similar preferences in one category will demonstrate comparable alignment in seemingly unrelated domains. To test this, we recruited 37 participants to evaluate stimuli across three distinct aesthetic domains: art, faces (male and female), and scenes. We developed a novel computational framework that reformulates cross-domain preference as a user-based collaborative filtering problem, encoding individual profiles through inter-subject similarity matrices. Our model successfully predicted participant responses in a target domain based on their similarity to the cohort in a separate source domain. These results demonstrate robust cross-domain consistency, suggesting that aesthetic evaluation is mediated by an abstract, domain-general mechanism rather than being purely stimulus-dependent. We propose that this consistency is rooted in a shared neurophysiological pathway, likely involving the orbitofrontal cortex (OFC) and the Default Mode Network (DMN), and discuss how these findings provide a foundation for more sophisticated, cross-modal recommendation systems and the study of individual social identity.
Ceolini, E.; Band, G.; Ghosh, A.
Show abstract
Fine-grained temporal structures emerge in smartphone behavioral recordings over multi-day periods. Complex systems research suggests that emergent temporal structures reflect underlying resource constraints of the system. Here we test whether cognitive abilities measured through speeded tasks (spanning fractions of a second) are reflected in emergent smartphone temporal structures spanning days, revealing how cognitive resource limitations shape naturalistic behavior. We analyzed smartphone tap interval patterns accumulated over several days and used decision tree regression models to predict performance in simple and choice reaction time tasks from these patterns. Simple reaction time was poorly predicted (R2 = 0.003), indicating that basic sensorimotor constraints play only a marginal role in shaping real-world behavioral timing. In contrast, choice reaction time was moderately predictable (R2 = 0.4), demonstrating that higher-order cognitive constraints prominently influence naturalistic temporal organization. Notably, while task performance operates at sub-second timescales, predictive temporal patterns in smartphone behavior spanned milliseconds to several seconds and was accumulated over days, revealing the broad, multi-scale influence of cognitive resource constraints on everyday behavior. Both predicted and measured choice reaction times showed age-related decline, but the decline was more pronounced in predicted values, suggesting that age-related cognitive changes may be amplified in naturalistic contexts. These findings demonstrate that emergent temporal structures in smartphone use can reveal how cognitive processes measured using speeded tasks manifest, or fail to manifest, in real-world behavior. These findings demonstrate that complex-systems approaches can bridge laboratory and naturalistic assessments of cognition, revealing which cognitive processes meaningfully constrain real-world behavior.
Diekmann, N.; Lissek, S.; Uengoer, M.; Cheng, S.
Show abstract
The progress of learning is usually quantified by averaging responses across participants and/or multiple trials within a block. However, such approaches obscure the trial-by-trial progress of learning, which has been shown recently to express a rich variety of dynamics. An alternative approach which does not suffer from this problem is the detection and analysis of points of behavioral change, i.e., change-point analysis. Using change-point analysis, we reanalyzed data from human participants in different predictive learning tasks in which learned contingencies underwent reversal. We find that responses of individual participants were more accurately characterized by behavioral change points than the average learning curve. Importantly, change points significantly shifted to later trials during reversal learning indicating that reversal learning is more difficult than the initial learning. In a computational model based on deep reinforcement learning, we show that the change point shift required the replay of previous experiences, which in turn depends on the hippocampus. This finding is consistent with studies showing that lesions of the hippocampus yield faster reversal learning. In summary, we reaffirm the importance of the analysis of single participant responses, show that phenomenological learning rates are slower during reversal learning, and provide a theoretical account for this difference.
Travi, F.; Mehta, A.; Castro, E.; Li, H.; Reinen, J.; Dhurandhar, A.; Meyer, P.; Fernandez Slezak, D.; Cecchi, G.; Polosecki, P.
Show abstract
A widespread view of neurodegenerative disorders, including Alzheimers Disease (AD), frames their effects as accelerated aging, with the brain-age gap (BAG, the deviation of predicted brain age from chronological age) as a staple biomarker. However, BAG relies on a fundamental, untested assumption: that AD can be identified via age-invariant brain phenotypes. Using invariant representation learning on brain MRI from 44,178 individuals, we created neural representations that optimally convey age information (age-aware) or conversely remove it (age-invariant) while minimizing reconstruction distortion. We provide the first causal evidence that age information is necessary in brain biomarkers for AD detection: age-aware representations achieve competitive state-of-the-art performance and significantly outperform age-invariant ones (0.84 vs. 0.77 AUC, p < 0.001, with external validation). This necessity reveals a conceptual flaw in BAG: by subtracting chronological age, it discards the very information essential for accurate detection. Using conditional decoders to simulate aging trajectories, we found that healthy aging and AD operate along multiple independent anatomical dimensions (deep gray matter, frontoparietal, temporal). AD patients diverge from rather than accelerate healthy aging, showing pathological temporal shifts alongside, remarkably, relative frontoparietal preservation. Furthermore, representational similarity analysis suggests that even models pretrained on non-age tasks (e.g., sex or BMI) implicitly converge toward age-related features when optimized for AD. Given that the AD phenotype cannot be decoupled from age, our results establish a hard limit for age-independent biomarkers and favor multidimensional models that preserve aging structure over unidimensional summaries like BAG.
Boger, T.; Firestone, C.
Show abstract
Some objects appear animate (e.g., dogs and elephants) while others do not (e.g., boots and sofas). This distinction pervades human cognition, with an expansive literature reporting striking effects of animacy on vision, memory, social perception, and neural organization. But studies of perceived animacy face a persistent challenge: Objects that differ in animacy tend to differ in many lower-level visual features (e.g., shape, texture, spatial frequency). Thus, it remains controversial whether animacy per se -- as opposed to its lower-level correlates -- drives visual processing. Here, we achieve previously unattainable levels of experimental control to demonstrate that the visual system represents animacy itself, beyond its lower-level covariates. We vary animacy while holding nearly all lower-level features constant by exploiting "visual anagrams" -- a diffusion-based technique for generating static images whose interpretations change radically with orientation. Seven pre-registered experiments leverage this approach to demonstrate that representations of animacy structure visual working memory and guide visual attention. Thus, the visual system extracts animacy itself, beyond its lower-level correlates.
Grasso, C. L.; Nalborczyk, L.; van Wassenhove, V.
Show abstract
Is there a geometry of time in the human mind? A canonical measure of time in psychology is duration, a time interval quantifiable as a magnitude. Durations have been proposed to be arranged along a mental timeline: a unidimensional, linear, and spatialised representation of time. Here, we asked whether such a mental timeline is sufficient to account for the experience of duration. To address this, we tested the same participants in two experiments: a behavioural similarity judgment task, in which participants rated the similarity of duration pairs, and an electroencephalography (EEG) experiment in which they detected oddball durations in a sequence. Behavioural and EEG data were used to construct representational dissimilarity matrices, whose geometry was compared against theoretical models of duration organisation. Our results reveal that most variance in behavioural similarity judgements is explained by three latent dimensions, interpretable as: magnitude (monotonic ordering of durations), contextual encoding (distance to the geometric mean of the duration set), and a periodic component. These three dimensions are jointly consistent with a latent generalised helical model, which provided excellent fit to the behavioural data. Individual helical model parameters further correlated with endogenous neural oscillations measured during rest, suggesting that an individuals duration space is partially constrained by intrinsic dynamics. The neural geometry was also found to be dynamic, unfolding in two successive stages: a strong logarithmic encoding of durations peaking around 150 ms after duration offset, followed by a spring-like geometry starting around 300 ms after offset. Together, these findings describe multidimensional psychological and neural geometries of duration space, and characterise their relationship.
WU, X. N.; Ren, X.; Dreher, J.-c.; Liu, C.
Show abstract
Children frequently intervene in social conflicts by punishing violators or helping victims, yet the motivational mechanisms underlying such third-party altruistic behavior remain poorly understood. It remains unclear how children balance fairness concerns against self-interest, how these motivations interact with intervention costs and impact on outcomes, and whether gender and individual differences reflect distinct motivational structures. Here, we applied the motive cocktail model, which assumes that altruistic behavior arises from multiple prosocial motives, to dissociate motivations underlying third-party interventions. We studied 229 children aged 8-12 years (123 boys), an age when fairness and inequality aversion are reliably expressed. The third-party intervention task manipulated inequality between others, the personal cost of intervention, its impact on outcomes, and the form of intervention (punishment versus helping). Children intervened more as inequality increased and less as intervention costs rose, indicating a trade-off between moral benefits and self-interest. Gender differences emerged only under high-cost and high-impact conditions, with boys engaging in more punishment interventions. The motive cocktail model outperformed alternative models and revealed that boys showed stronger aversion to disadvantageous inequality and a greater tendency to reverse victims disadvantage than girls. Clustering analyses further identified distinct motivational profiles within each gender. These findings demonstrate that childrens third-party altruistic behavior is governed by multiple dissociable motives. This study provides a mechanistic account of how social motivations are organized and weighted during late childhood.
Tzionit, N.; Filmon, D. G.; Maeir, T.; Boettcher, S. E. P.; Nobre, A. C.; Shalev, N.; Landau, A. N.
Show abstract
Attention-deficit/hyperactivity disorder (ADHD) has been associated with atypical temporal processing across multiple cognitive domains. However, most evidence derives from simplified paradigms that isolate timing from spatial behaviour. Here, we examine how temporal prediction operates within a continuous, dynamic visual environment. Using the Dynamic Visual Search (DVS) task, we embedded spatiotemporal regularities into a sustained stream of visual events, allowing observers to implicitly learn and anticipate predictable targets. Continuous mouse tracking provided a fine-grained measure of action planning beyond discrete reaction time and accuracy metrics. Young adults diagnosed with ADHD (N=40) were compared to matched neurotypical controls (N=38). Both groups benefited from target predictability and reduced distractor load, indicating intact early spatiotemporal learning in ADHD. Across the duration of the task, however, the groups diverged. Neurotypical participants showed progressive increases in behavioural benefits from prediction, accompanied by increasingly direct and efficient mouse trajectories. In contrast, individuals with ADHD reached a plateau in prediction benefits midway through the experiment. Their performance remained stable, with minimal evidence of resource depletion, but did not show further optimisation based on learned regularities. These findings suggest that while prediction formation is preserved in ADHD, its progressive utilisation across longer timescales is attenuated. Rather than reflecting a primary deficit in learning or sustained attention, ADHD may involve altered long-timescale integration or weighting of predictive information in dynamic environments.
Chavanne, A. V.; Wang, Y.; de Boer, A. A. A.; Xu, B.; van Prooije, T. H.; Kapteijns, K. C. J.; Reniers, C.; Hernandez-Castillo, C. R.; Fernandez-Ruiz, J.; van de Warrenburg, B. P.; Diedrichsen, J.; Muetzel, R. L.; Marquand, A. F.
Show abstract
Brain disorders are often characterized by biological heterogeneity that is poorly captured by group-average analyses. Normative modeling has emerged as a promising tool to parse out such heterogeneity, yet existing lifespan reference models rely on coarse parcellations, which may obscure individual variability. Using an aggregated reference sample (n=58,597 scans from n=51,107 participants), we provide openly available normative models of brain morphometry at the voxel level across the lifespan, and we illustrate their potential utility with two complementary applications. First, we investigated long-term brain development after preterm birth across two independent cohorts (n=284; n=304) and found individualized, replicable and persistent brain alterations. Second, we extracted high-resolution patient-level morphometric deviations in two samples with rare, genetic neurodegenerative disorders (spinocerebellar ataxia type 1 and 3; n=29, n=15), which showed marked heterogeneity. Together, our findings highlight that voxelwise normative modeling can detect clinically relevant, individualized deviations from a reference model with high spatial precision.
Bilgin, S. N.; Kononowicz, T. W.; Giomo, D.; Mustafali, U.
Show abstract
Metacognition refers to the capacity to monitor ones own actions, internal states, and cognitive processes. A central question in cognitive neuroscience is whether metacognitive evaluation operates as a direct readout of performance signals or requires computationally independent neural mechanisms. Single-process theories propose that both arise from shared decision variables, while the Higher-Order Representation theory holds that metacognition requires re-representation through distinct computational processes. To test these frameworks, participants produced timed motor intervals and evaluated their own performance without external feedback, termed temporal error monitoring (TEM). Vision Transformer decoding applied to PCA-optimized single-trial EEG captured {theta}, , and {beta} dynamics during both task phases. First-order timing was decodable from any individual frequency band, whereas second-order metacognitive inference required simultaneous integration across all three bands before action termination. Individuals whose metacognitive states were more accurately decoded showed stronger TEM precision, with no equivalent relationship observed for first-order performance decoding. These findings establish metacognitive evaluation as a computationally distinct process requiring higher-order multi-band neural integration rather than a direct readout of first-order timing signals.
Higashi, H.
Show abstract
Extracting stable individual traits from behavior observed across diverse contexts is a central challenge in behavioral modeling. We propose a framework for inferring domain-invariant individual latent representations by jointly encoding behaviors across multiple domains. Using large-scale telemetry data from professional Counter-Strike 2 gameplay, we demonstrate that these representations are stable across distinct environments and roles, improving behavior prediction in novel domains. Our analysis reveals that complex idiosyncratic movement policies can be effectively compressed into low-dimensional embeddings, with as few as two dimensions capturing the majority of individual strategic variation. Crucially, the learned latent space forms a structured metric space where Euclidean distances predict the degradation of transfer performance. Furthermore, we show that the latent axes align with interpretable behavioral phenotypes, such as risk-taking and social cohesion. These findings suggest that multi-domain integration is a robust method for uncovering the functional structure of latent individuality in complex decision-making tasks, bridging the gap between high-dimensional telemetry data and meaningful psychological constructs.
Fukui, H.
Show abstract
Background The ageing of incarcerated populations is accelerating across high-income countries, yet dementia remains absent from routine correctional mental health statistics. We investigated whether correctional data systems in Japan, the United States, the United Kingdom, and Australia are structurally capable of detecting dementia in their prison populations. Methods We conducted a cross-national descriptive analysis of publicly available, aggregate-level correctional data. Japanese data comprised all newly admitted sentenced prisoners from 2006 to 2024 (approximately 390,000 individuals) from the Ministry of Justice Correctional Statistics Annual, including mental disorder classifications and CAPAS-derived work aptitude scores (used as a proxy for cognitive functioning; not clinical IQ measurements). US data were drawn from the Bureau of Justice Statistics Survey of Prison Inmates (2016). UK data were obtained from the Ministry of Justice Offender Management Statistics Quarterly (2015-2025). Australian data were sourced from the Australian Institute of Health and Welfare National Prisoner Health Data Collection (2022, n = 371). All analyses were descriptive; no inferential statistics were conducted. Findings Three distinct mechanisms rendered dementia statistically invisible across all four countries. First, in the United States and Australia, reliance on self-report instruments produced a paradox in which self-reported mental disorder prevalence declined with age: among US state prisoners, reported prevalence fell from 44.9% in the 35-44 age group to 31.9% among those aged 65 and older - the opposite of community epidemiological patterns. Second, in Japan - the only country with systematic cognitive assessment at prison admission - 35.0% of female theft offenders had work aptitude scores below 70, yet the classification system contains no dementia category; 43-52% of all detected mental disorders were absorbed into a residual "other" category even after a 2023 classification revision that added four new diagnostic categories but not dementia. Third, the United Kingdom lacks routine mental health prevalence data collection in prisons altogether. None of the four countries includes dementia as a standard correctional classification category. Interpretation Correctional mental health statistics across four high-income countries are structurally incapable of detecting dementia - not through clinical ignorance but by design: systems built for younger populations that have not been updated as prison demographics have changed. Japan's ageing female theft offender profile (39.4% aged 60 or older, 35.0% with low cognitive scores) represents a potential sentinel population for undetected cognitive impairment. Targeted interventions - cognitive screening at admission in the United States and Australia, introduction of a dementia classification category in Japan, and routine mental health data publication in the United Kingdom - are feasible with existing infrastructure. As prison populations continue to age, the statistical invisibility of dementia constitutes an escalating failure of health surveillance with direct consequences for clinical care, sentencing, and human rights.
Demetriou, I.; Correia, M.; Vidal-Pineiro, D.; Apsvalka, D.; Attaheri, A.; Emery, T.; Henson, R. N.
Show abstract
Cortical volume, a widely-used marker of brain ageing, is the product of two genetically and developmentally dissociable morphometric features: thickness and area. However, it remains unclear whether these two features have dissociable consequences for cognitive ageing. To address this, we analyse cross-sectional and longitudinal neuroimaging and cognitive data from one discovery cohort (Cam-CAN) and two independent, pre-registered replication cohorts (OASIS-3 and HABS-HD), leveraging wide age ranges across adulthood, different follow-up intervals and diverse population backgrounds. We show that thickness declines more steeply with age than does area, and shows stronger associations with longitudinal change in fluid cognitive abilities, fairly uniformly across the cortex. Cognitive change is also dependent on baseline thickness, independent of thickness change and independent of baseline cognitive ability. In contrast, area is comparatively stable across adulthood, at least until old age, and shows weaker and more heterogeneous associations with cognitive change, despite being a stronger mediator of the effect of polygenic scores on baseline cognitive ability. Together, these findings help to reconcile inconsistencies in the literature, and indicate that thickness provides a more sensitive marker of dynamic neurobiological processes underlying cognitive ageing, whereas area seems to reflect primarily stable, trait-like variation in cognitive ability.
Wang, S.
Show abstract
Large Language Models achieve impressive accuracy on medical benchmarks that present clinical information as complete vignettes, but their behavior under sequential information delivery, the standard mode of real clinical practice, is poorly characterized. We conduct a three-condition ablation study (N=50 NEJM-derived cases, 150 total runs) using claude-sonnet-4-20250514 to investigate what happens when diagnostic information arrives in stages rather than all at once. We introduce a novel 5+2 scoring rubric measuring seven dimensions of reasoning quality beyond binary accuracy, and a 6-code failure mode taxonomy enabling mechanistic root-cause analysis of diagnostic failures. We document Convergence Regression (CR): a systematic failure mode where models correctly identify diagnoses at intermediate reasoning stages but abandon them when subsequent evidence triggers pattern-matching to alternative diagnoses. Under unstructured sequential delivery, models access the correct diagnosis in 90% of cases but retain it in only 60%, creating a 30% Access-Stability Dissociation invisible under single-shot evaluation. A structured scaffold, the Sequential Information Prioritization Scaffold (SIPS), eliminates this gap entirely through forced hypothesis accountability: 80% access, 80% final accuracy, 0% Convergence Regression. We term this the SIPS Retention Effect. However, scaffolding reduces top-1 accuracy from 60% to 40%, a Convergence Hesitancy Paradox establishing that retention and convergence are architecturally distinct reasoning tasks requiring separate mechanisms. We propose that structured scaffolding functions as a diagnostic sensor for reasoning pathology rather than an accuracy intervention: it makes failure modes visible, classifiable, and auditable. We demonstrate that our measurement instruments operationalize WHO and FDA governance requirements for AI transparency, accountability, and safety into quantifiable scores. We release the complete framework, including the 5+2 rubric, 6-code taxonomy, scaffold specification, and 210-score matrix with adjudication rationale, as a reusable audit instrument for evaluating LLM reasoning behavior in any sequential reasoning context. The study evolved across three analytical phases: N=50 aggregate ablation establishing population-level scaffold effects; stratified N=10 mechanistic case analysis characterising the specific failure mode and its structural remedy; and N=10 cross-model replication across three architecturally distinct LLMs (Claude Sonnet 4, GPT-4o, Llama 3.3-70B) testing generalisability. A subsequent multi-model validation study confirms that core C3 process properties -- Hypothesis Tracking universality (5.0/5.0) and Step Adherence (4.9-5.0) -- replicate across GPT-4o and Llama 3.3-70B under identical protocols. The Convergence Hesitancy Paradox, while present in GPT-4o, is absent in Claude and Llama, establishing that the scaffold measures model-specific reasoning profiles rather than imposing a single fixed performance trade-off.
Debnath, A.; Sarkar, S.
Show abstract
BackgroundAlzheimers disease (AD) causes progressive decline in language and cognition. Automated speech analysis has emerged as a promising screening tool, yet clinical data scarcity limits progress. To address this, we generated a large-scale simulated speech dataset to model linguistic and acoustic deterioration across cognitive stages, Control, Mild Cognitive Impairment (MCI), and AD. MethodsUsing Monte Carlo simulations, we emulated the Pitt DementiaBank "Cookie Theft" narratives. Acoustic features (speech rate, pause duration, jitter, shimmer) and linguistic features (type-token ratio, unique-word count, filler usage) were synthetically sampled from real-world DementiaBank distributions. We trained an XGBoost classifier to distinguish diagnostic groups, and applied SHAP (Shapley Additive exPlanations) to assess feature importance. ResultsThe model achieved high discriminative performance (AUC {approx} 0.94; accuracy {approx} 85%). Compared to controls, simulated MCI and AD groups showed progressive declines in fluency and lexical diversity, and increases in disfluencies and voice instability. SHAP analysis revealed that key predictors included reduced type-token ratio, higher pause and filler rates, and elevated jitter/shimmer. Classification was most accurate for Control vs. AD; MCI misclassifications highlighted intermediate profiles. InterpretationOur framework, FMN (Forget Me Not), captures clinically relevant speech changes using simulated data, offering an explainable and scalable approach for cognitive screening. While not a substitute for real datasets, FMN validates a pipeline that mirrors known AD markers and can guide future real-world deployments. External validation remains a key next step for translational impact.